Deep learning based approach to unstructured record linkage

نویسندگان

چکیده

Purpose In the world of big data, data integration technology is crucial for maximising capability data-driven decision-making. Integrating from multiple sources drastically expands power information and allows us to address questions that are impossible answer using a single source. Record Linkage (RL) task identifying linking records describe same real object (e.g. person), it plays role in process. RL challenging, as uncommon different share unique identifier. Hence, must be matched based on comparison their corresponding values. Most existing techniques assume across structured represented by scheme (i.e. set attributes). Given increasing amount heterogeneous sources, those assumptions rather unrealistic. The purpose this paper propose novel model unstructured data. Design/methodology/approach previous work (Jurek-Loughrey, 2020), authors proposed approach application Siamese Multilayer Perceptron model. It was demonstrated method performed par with other approaches make constraining regarding This originally presented at iiWAS2020 [16] exploring new architectures Neural Network, which improves generalisation makes less sensitive parameter selection. Findings experimental results confirm Autoencoder-based architecture Network obtains better (Jurek et al. , 2020). Better have been achieved three out four sets. Furthermore, has second (hybrid) integrating Autoencoder model, more stable terms Originality/value To problem RL, presents deep learning improve Preceptron

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Electre Tri-Machine Learning Approach to the Record Linkage Problem

In this short paper, the Electre Tri-Machine Learning Method, generally used to solve ordinal classification problems, is proposed for solving the Record Linkage problem. Preliminary experimental results show that, using the Electre Tri method, high accuracy can be achieved and more than 99% of the matches and nonmatches were correctly identified by the procedure.

متن کامل

Implementing a Bayesian Approach to Record Linkage

The Census Coverage Measurement survey-based program estimated household population coverage of the 2010 Decennial Census. Calculating coverage estimates required linking survey person data to census enumerations. For record linkage research, we applied a Bayesian Latent Class Models approach to both 2010 coverage survey data and simulated household data. This paper presents our use of Base SAS...

متن کامل

Validating Distance-Based Record Linkage with Probabilistic Record Linkage

This work compares two alternative methods for record linkage: distance based and probabilistic record linkage. It compares the performance of both approaches when data is categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results in relation to the num...

متن کامل

Behavior Based Record Linkage

In this paper, we present a new record linkage approach that uses entity behavior to decide if potentially different entities are in fact the same. An entity’s behavior is extracted from a transaction log that records the actions of this entity with respect to a given data source. The core of our approach is a technique that merges the behavior of two possible matched entities and computes the ...

متن کامل

Supervised learning approach for distance based record linkage as disclosure risk evaluation

In data privacy, record linkage is a well known technique to evaluate the disclosure risk of protected data. It is used to evaluate the number of linked records between a data set and its protected version. In this paper we give an overview of the work that we have been doing during the last months. We describe the development of a supervised learning method for distance-based record linkage, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Web Information Systems

سال: 2021

ISSN: ['1744-0092', '1744-0084']

DOI: https://doi.org/10.1108/ijwis-05-2021-0058